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Abstract 

In 1975, Valiant showed that Boolean matrix multiplication can be used for parsing context- 
free grammars (CFGs), yielding the asympotically fastest (although not practical) CFG parsing 
algorithm known. We prove a dual result: any CFG parser with time complexity 0{grr’~^), 
where g is the size of the grammar and n is the length of the input string, can be efficiently 
converted into an algorithm to multiply m x m Boolean matrices in time Given 

that practical, substantially sub-cubic Boolean matrix multiplication algorithms have been quite 
difficult to find, we thus explain why there has been little progress in developing practical, sub¬ 
stantially sub-cubic general CFG parsers. In proving this result, we also develop a formalization 
of the notion of parsing. 


1 Introduction 


The context-free grammar (CFG) formalism, introduced by [Chomsky (1956 ), has enjoyed wide use 
in a variety of fields. CFGs have been used to model the structure of programming languages, 
human languages, and even biological data such as the sequences of nucleotides making up DNA 
and RNA ( Aho, Sethi, and Ullman, 1986| ; Jurafsky and Martin, 2000 ; Durbin et ah, 1998 ). 

CFGs are generative systems, where strings are derived via successive applications of rewriting 
rules. In practice, however, the goal generally is not to generate valid strings from a grammar. 
Rather, one typically already has some string of interest, such as a C program or an English 
sentence, in hand, and the goal is to analyze — parse — the string with respect to the grammar. 
Canonical methods for general CFG parsing are the GKY algorithm (Kasami, 1965]; [Younger 


1967) and Earley’s algorithm ( [Earley, 197C[). Both have a worst-case running ti me of 0{gn^) for 
a CEG of size g and string of length n ([Graham, Harrison, and Ruzzo, 19'80), although GKY 


requires the input grammar to be in Chomsky normal form in order to achieve this time bound. 
Unfortunately, cubic dependence on the string length is prohibitively expensive in applications such 
as speech recognition, where responses must be made in real time, or in situations where the input 
sequences are very long, as in computational biology. 

Asymptotically faster parsing algorithms do exist, praham, Harrison, and Ruzzo (1980[) give a 
variant of Earley’s algorithm that is based on the so-called “four Russians” algorithm ( Arlazarov et[ 
ah, 197Cl| ) for Boolean matrix multiplication (BMM); it runs in time 0{g'n?/ log n). Rytter (1985[ ) 
further modifies this parser by a compression technique, improving the dependence on the string 
length to 0(n^/log^n). But Valiant’s ( [1975[ ) parsing method, which reorganizes the computations 
of CKY, is the asymptotically fastest known. It also uses BMM; its worst-case running time for a 
grammar in Chomsky normal form is proportional to M{n), where M{m) is the time it takes to 
multiply two m x m Boolean matrices together. 

Since these subcubic parsing algorithms all depend on Boolean matrix multiplication, it is 
natural to ask how fast BMM can be performed in practice. The asymptotically fastest way 
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Figure 1: Lowest known upper bound on the exponent w for the complexity of matrix multi¬ 
plication. For instance, before 1969, the fastest known algorithm for matrix multiplication took 
proportional to rrfi steps (cu = 3). 


known to perform BMM is to rely on algorithms for multiplying arbitrary matrices. There exist 
matrix multiplication algorithms with time complexity thus improving over the standard 

algorithm’s 0{rrfi) running time; for instance, Strassen’s ( 1969| ) has a worst-case running time of 
and the fastest currently known, due to Coppersmith and Winograd (1987;1990), has time 
complexity (See Strassen (1990) for a historical account, plotted graphically in figure 

||.) Unfortunately, the constants involved in the subcubic algorithms improving on Strassen’s result 
are so large that these fast algorithms cannot be used in practice. As for Strassen’s method itself, 
its practicality is ambiguous; empirical studies show that the “cross-over” point — the matrix size 
at which it becomes better to use Strassen’s method — is above 100 ( Bailey, 198^ ; Thottethodi 


Chatterjee, and Lebeck, 1998|) . In summary, despite decades of research effort, there has been little 
success at hnding a clearly practical, simple, fast matrix multiplication algorithm. 

One might therefore hope to find a way to speed up CFG parsing without relying on matrix 
multiplication. However, the main theorem of this paper is that fast CFG parsing requires fast 
Boolean matrix multiplication, in the following precise sense: any parser running in time 0{gn^~^) 
that represents parse data in a retrieval-efficient way can be converted with little computational 
overhead into an BMM algorithm. 

The restriction of our result to parsers with a linear dependence on the grammar size is crucial for 
relating sub-cubic parsing to sub-cubic BMM. However, as discussed in section 2.3, this restriction 
is a reasonable one since canonical parsing algorithms such as CKY and Earley’s algorithm have 
this property, and furthermore, in domains like natural language processing, the grammar size is 
often the dominating factor. 

Our theorem, together with the fact that it has been quite difficult to find practical fast matrix 
multiplication algorithms, explains why there has been little success to date in developing practical 
CFG parsers running in substantially sub-cubic time. 


2 The parsing problem: a formalization 

In this section, we motivate and set forth a formalization of the parsing problem. 
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flights on Tuesday 


Figure 2; Two different parse trees for the sentence “List all flights on Tuesday”. The labels on 
the interior nodes denote linguistic categories. 


2.1 Motivation for our definition 


In formal language theory, emphasis has been placed on the recognition or membership problem: 
deciding whether or not a given string can be derived by a grammar. However, we concentrate here 
on the parsing problem: finding the parse structure, or analysis, assigned to a string by a grammar. 
(In the case of ambiguous strings, multiple parses exist; we address this point below.) 

From a theoretical standpoint, the two problems are almost equivalent. Recognition obviously 
reduces to parsing, and indeed to our knowledge there are no CFG recognition algorithms that do 


not implicitly compute parse information. Conversely, Ruzzo (1979 ) demonstrated that any CFG 
recognition algorithm that is not already an implicit parser can be converted into an algorithm that 
returns a (single) parse of the input string w, at a cost of only a factor of 0(log |t(;|) slowdown. 

In practice, however, the parsing problem is much more compelling than the membership prob¬ 
lem. Understanding the structure of the input string is crucial to programming language compi¬ 
lation, natural language understanding, RNA shape determination, and so on. In fact, in speech 
recognition systems, a useful assumption is that any input utterance is somehow “valid”, even if 
it is ungrammatical, thus making the recognition problem trivial. However, different parses of the 
input sentence may lead to radically different interpretations. For example, the classic sentence 
“List all flights on Tuesday” has two different parses (see Figure ^): one indicates that all flights 
taking off on Tuesday should be listed right now, whereas the other asks to wait until Tuesday, and 
then list all flights regardless of their departure date. Another well-known ambiguous sentence is 
“I saw the man with the telescope”; observe that here the two possible interpretations seem to be 
about equally likely. 

The fact that some input strings are ambiguous raises the question of what we should require 
the output of a parsing algorithm to be: any single parse of the input string (Ruzzo’s reduction of 
parsing to recognition uses this model), or all possible parses? In practice, since multiple analyses 
may be valid (as in the natural language examples above), it is clear that any practical parser 
should return all parses. 

It remains to determine what the format of the output parses should be. One problem is that 
there exist grammars in which the number of parse trees for strings of length n grows exponentially 
in n; for example, consider the Chomsky normal form CFG with productions S —> S'S'|a.Q Hence, a 
compressed representation of the parse structures must be used; otherwise, every parser could take 
exponential time just to print its output. However, we must be careful to impose restrictions on 


^If we do not impose any restrictions on the form of the grammar, then an infinite number of parse trees can be 
produced for a single string; for example, consider the production set S —> S\a. 
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the compression rate: after all, we could perversely consider the input string itself to be a (rather 
inconvenient) representation of all its parse trees ( Ruzzo, 1979 ). We thus require practical parsers 
to output all the parses of an input string in a representation that is both compact and yet allows 
efficient retrieval of parse information. In the next subsection, we make this notion precise. 


2.2 C-parsing of context-free grammars 

We use the usual definition of a context-free grammar (CFG) as a 4-tuple G = (S, V, R, S), where 
S is the set of terminals, V is the set of nonterminals, R is the set of rewrite rules or productions, 
and S € V is the start symbol. Given a string w = wiW 2 ■ ■ ■ Wn in S*, where each Wi is an element 
of S, we use the notation to denote the substring WiWiJ^i ■ ■ ■ Wj-iWj. The size of G, denoted by 
|G|, is the sum of the lengths of all productions in R. 

Our notion of necessary parse information is based on the concept of CFG c-derivations, which 
are substring derivations that are consistent with some parse of the entire input string. 


Definition 1 Let G = {T,,V, R, S) be a CFG, and let w = wiW 2 ■ ■ ■ Wn, Wi € T,. A nonterminal 
AgV c-derives (consistently derives) wf if and only if the following conditions hold: 

• A w{, and 

m S w{-^Awl)^^. 

(These conditions together imply that S =^* w.) 


We argue, as do Ruzzo (1979| ) and, for a different formalism, Satta (1994| ), that a practical 
parser must create output from which c-derivation information can be retrieved efficiently. This 
information is what allows us to ascertain that there exists an analysis of the input sequence for 
which a certain substring forms a constituent, or coherent unit. In contrast, derivation information 
records potential sub derivations that may not be consistent with any analysis of the full input string. 
For example, in the sentence “Only the lonely can play”, “the lonely can” could conceivably, in 
isolation, form a noun phrase, but clearly in any reasonable grammar of English no nonterminal 
c-derives that substring. While some parsers retain information about derivations that are not 
c-derivations, we formulate our definition of parsing to include algorithms that do not. 


Definition 2 A c-parser is an algorithm that takes a CFG G = (S, V, R, S) and string w € T,* as 
input and produces output Tg,w that acts as an oracle about parse information as follows: for any 
AeV, 

• If A c-derives wl, then iFG,w{A,i,j) = “yes”. 

• If A 7^* wl (which implies that A does not c-derive wl), then iFG,w{A,i,j) = “no”. 

• d^G,w answers queries in constant time. 

The asymmetry of derivation and c-derivation in our definition of c-parsing is deliberate. We 
allow iFG,w’s answer to be arbitrary if A wl but A does not c-derive wl; we leave it to the 
algorithm designer to decide which answer is appropriate. Thus, our definition makes the class 
of c-parsers as broad as possible: if we had changed the first condition to “If A derives wl ...”, 
then Earley parsers would be excluded, since they do not keep track of all substring derivations; 
whereas if we had written the second condition as “If A does not c-derive wl, ...”, then GKY 
would not be a c-parser, since it tracks all substring derivations, not just c-derivations. In fact, the 
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class of c-parsers contains all tabular parsers, including generalized LR parsing, CKY, and Earley’s 
algorithm ( Nederhof and Satta, 19^ ). In contrast, Ruzzo (1979| ) deals with the difference between 
derivations and c-derivations by defining two different problems (the weak all-parses problem and 
the all-parses problem). 

Our choice of an oracle rather than a specific data structure as the output of a c-parser is also 
for the purpose of keeping our definition as broad as possible. In tabular algorithms like CKY, the 
oracle is given in the form of a matrix or chart; indeed, Ruzzo’s ( |1979| ) definition of the all-parses 
and weak all-parses problems requires the output to be a matrix. However (as Ruzzo points out), 
this is not the only possibility, and furthermore has a liability from a technical point of view: if 
the output must be a matrix, then all parsing algorithms must take time at least fl(n^) even to 
print their output. Since it may be possible for c-derivations to be represented more compactly, we 
prefer to allow for this possibility in our definition. 

Finally, with regards to the third condition, we observe that patta (1994 ) imposes the same 
constant-time constraint for a different grammar formalism (tree-adjoining grammars). On the 
other hand, we could loosen this to allow query processing to take time polylogarithmic in the 


string and grammar size without much effect on our results (see section 3.5) 


2.3 Analyzing parser runtimes 

It is common in the formal language theory literature to see the running time of parsing algorithms 
described as a function of the length of the input string only (e.g., 0{rfi) for a string of length n). 
That is, the size of the context-free grammar is often treated as a constant. This stems in part 
from two characteristics of the programming languages and compilers domains: first, the size of a 
computer program’s source code is typically much greater than the size of the grammar describing 
the programming language’s syntax, so that the grammar term is negligible; and second, compilers 
are constructed to analyze many different programs with respect to a single built-in grammar. 

However, in other domains these conditions do not hold. For example, in natural language, 
sentences are relatively short (not often longer than one hundred words) compared with the size 
of the grammar: [Johnson (199^) describes a (probabilistic) CFG for a subset of English that has 
22,773 rules. Indeed, Joshi (1997| ) notes that “the real limiting factor in practice is the size of the 
grammar”. Therefore, it is reasonable to include in the analysis of parsing time the dependence on 
the grammar size, and we will do so here. As a point of information, we note that both CKY and 


Earley’s algorithm can be implemented to run in time 0{\G\n'^) (Graham, Harrison, and Ruzzo 


1980), although CKY requires the input grammar to be in Chomsky normal form, conversion to 


which may cause a quadratic increase in the number of productions in the grammar ( Hopcroft and 
Ullman, 1979| ). 


3 The reduction 


In this section, we provide two efficient reductions of Boolean matrix multiplication to c-parsing, 
thus proving that any c-parsing algorithm can be used as a Boolean matrix multiplication algo¬ 
rithm with little computational overhead. The first reduction produces a string and a context-free 
grammar; the second is a modification of the first in which the grammar produced is in Chom¬ 
sky normal form. The techniques we use are an adaptation of Satta’s (1994) elegant reduction of 
Boolean matrix multiplication to tree-adjoining grammar (TAG) parsing. However, Satta’s results 
rely explicitly on properties of TAGs that allow them to generate non-context-free languages, and 
so cannot be directly applied to CFGs. 
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Figure 3: Converting a c-parser into a BMM algorithm. 


3.1 Boolean matrix multiplication 

A Boolean matrix is a matrix with entries from the set {0,1}. A Boolean matrix multiplication 
(BMM) algorithm takes as input two m x m Boolean matrices A and B and returns their Boolean 
product A X B, which is the m x m Boolean matrix C whose entries are defined by 

m 

k=l 


That is, Cij = 1 if and only if there exists a number k, 1 < k < m, such that Oik = bkj = 1. 

As noted above, the Boolean product C can be computed via standard matrix multiplication, 
since Cij = J2T=i o-ik'bkj- This means that we can use the [Coppersmith and Winograd (1990| ) general 
matrix multiplication algorithm to calculate the Boolean matrix product of two m x m Boolean 
matrices in time To our knowledge, the asympotically fastest algorithms for BMM all 

rely on general matrix multiplication; the fastest algorithms that do not do so are the so-called 
“four Russians” algorithm ( Arlazarov et ah, 19700 , with worst-case running time 0(m^/log(m)), 
and Rytter’s (1985|) variant which uses compression to reduce the time to 0(m^/log^(m)). 


3.2 The reduction: first version 

Our goal in this section is to show that Boolean matrix multiplication can be efficiently reduced to 
c-parsing of CFGs. That is, we will describe a simple procedure that takes as input an instance of 
the BMM problem and converts it into an instance of the CFG parsing problem with the following 
property: any c-parsing algorithm run on the new parsing problem yields output from which it 
is easy to determine the answer to the original BMM problem. We therefore demonstrate that 
any c-parser can be used to solve Boolean matrix multiplication via the three-step process shown 
schematically in Figure |^. 

Thus, given two Boolean matrices A and B, we need show how to produce a grammar G and a 
string w such that c-parsing w with respect to G yields output tFG,w from which information about 
the Boolean product G = A x B can be easily retrieved. Our approach will be to encode almost 
all the information about A and B in the grammar. 

We can sketch the desired behavior of the grammar G as follows. Suppose entries a^k in A 
and bkj in B are both 1. Assume we have some way to break up array indices into two parts so 
that i can be reconstructed from R and Z 2 , j can be reconstructed from ji and j 2 , and k can be 
reconstructed from ki and /c 2 (we will describe a way to do this later; the motivation is to keep the 
grammar size relatively small). Then, our grammar will permit the following derivation sequence: 




n,ji 


,ki Bk 


l,jl 
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^ Wi^ - ■ ■ Wk2+& Wk2+S+1 ■ ■ ■ Wj2+25, 

' - . -^ '- , -^ 

derived by derived by -Bfcui 

where 5 will be defined later. The key thing to observe is that generates two nonterminals 

whose “inner” indices match, and that these two nonterminals generate substrings that lie exactly 
next to each other. The “inner” indices constitute a check on ki, and substring adjacency constitutes 
a check on k 2 ', together, these two checks serve as a proof that aik = hkj = 1, and hence that Cij is 
also 1. 

We now set up some notation. Let A and B be two Boolean matrices, each of size m x m, and 
let C be their Boolean matrix product. In the rest of this section, we consider A, B, C, and m to 

be fixed. Set d = and set h = d + 2. (The effect of these choices on the efficiency of our 

reduction is discussed in section We will be constructing a string of length 36; we choose 6 

slightly larger than d in order to avoid having epsilon-productions in our grammar. 

Our index encoding function is as follows. Let i be a matrix index, 1 < i < m < d^. Then, we 
define the function f{i) = (/i(i),/ 2 (^)) by 

/i(i) = [i/d\ (so that 0 < /i(i) < d^), and 

/ 2 (*) = (* rnod d) + 2 (so that 2 < / 2 (f) < d -I- 1). 

Since fi{i) and / 2 (i) are essentially the quotient and remainder of integer division of i by d, we can 
reconstruct i from (/i(i), / 2 (*))- It rnay be helpful to think of these two quantities as “high-order” 
and “low-order” bits, respectively. For convenience, we will employ the notational shorthand of 
using subscripts instead of the functions fi and / 2 ; that is, we write ii and 12 for fi{i) and f 2 {i)- 

It is now our job to create a CFG G = (S, V, R, S) and a string w G T,* that encode information 
about A and B and express constraints about their product C. 

We choose the set of terminals to be S = {wi : 1 < i < 3d + 6}. The string we choose is 
extremely simple, and in fact doesn’t depend on A or B at all: we set w = W 1 W 2 ■■ ■ W 3 d+ 6 - We 
consider w to be made up of three parts, x, y, and z, each of size 6 : 


W — W 1 W 2 ■■ ■ Wd+2 Wd+3 ■ ■ ■ W2d+i W2d+5 ' ' ‘ W^d+Q ■ 


X 


y z 

Observe that for any array index i between 1 and m, it is the case that Wi^ appears in x, 
appears in y, and Wi^j^ 2 & appears in z, since 


E 

[2, d -|- 1], 

i2 + d 

E 

[d -|- 4, 2d -|- 3], and 

*2 + 2d 

E 

[2d-h 6,3d-h 5]. 


We now turn our attention to constructing the grammar G. Our plan is to include a set of 
nonterminals {Gp^q : 1 < p, g < d^} in V such that qj = 1 if and only if c-derives 


3.3 The grammar 

To create G = (I1,V,R,S), we build up the set of nonterminals and productions, starting with 
1/ = {S'} and R = tj). We add nonterminal W to V for generating arbitrary non-empty substrings 
and therefore add productions 


W —> W£W\w£, 1 < i < 3d + 6. 


(W-rules) 
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Figure 4: Schematic of the derivation process when aik = = 1- The substrings derived by 

and lie right next to each other. 


Next, we encode the entries of the input matrices A and B in our grammar. We add the 
nonterminals from the sets {^p,g : 1 < p, g < d?} and {Bp^q : 1 <p,q < d?}. Then, for every non¬ 
zero entry in A, we add the production 

— >Wi^Wwj^+s- (A-rules) 

For every non-zero entry btj in B, we add the production 

—> Wi^+i+sWwj^+25- (B-rules) 

To represent the entries of C, we add the nonterminals from the set {Cp^q : 1 <p,q < and 
include productions 

Cp^q —> Ap^rBr,q, 1 < p,q,r < S. (C-rules) 

Finally, we complete the construction with productions for the start symbol S: 

S —> WCp^qW, 1 < p, g < d^. (5-rules) 

We now prove the following result about the grammar and string we have just described. 
Theorem 1 For 1 < i, j < m, the entry Cij in C is non-zero if and only if c-derives . 


Proof. Fix i and j. 

Let us prove the “only if” direction first. Thus, suppose Cij = 1. Then there exists a k such 
that Oik = bkj = 1. Figure ^ sketches how c-derives . 
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Claim 1 


The production —>■ is one of the C-rules in our grammar. Since aik = 1, 

^ii,ki —^ Wi^Wwk^+s is one of our ^-rules, and since b^j = 1, —>■ Wk 2 +i+ 5 W'Wj^^ 2 S is one 

of our i?-rules. Finally, since ^2 + 1 < {k 2 + 6) — 1 and (A ;2 + 1 + 5) + 1 < (j 2 + 2(5) — 1, we have 
w «,*»+*-■ and W =^* tc;^ 2 + 2 +( 5 ^’ since both substrings are of length at least one. Therefore, 




*1J1 


,fci Bf;-^ 

Wi^Wwk^+s 


Wk2+l+5WWj2+25 


derived by derived by 


w: 


J2+2<5 


□ 

Claim 2 5 

This claim is essentially trivial, since by the definition of the S'-rules, we know that S WCi-^^j^W. 
We need only show that neither nor w^^^ 2 S+i i® empty string (and hence can be derived 

by IT); since 1 < ^2 — 1 and j 2 + 25 + 1 < Sd + 6, the claim holds. q 

Claims and ^ together prove that Cj^ c-derives as required.^ 

Next we prove the “if” direction. Suppose c-derives , which by definition means 

Ciiji ^ . This can only arise through the application of a C-rule: 

for some k'. It must be the case that for some i, Ai^^k' Wi 2 and ^ * . But then 

we must have the productions —> Wi^Wwg and —> we^iWwj 2+25 with i = k" + 5 

for some k". But we can only have such productions if there exists a number k such that fci = k\ 
k 2 = k", ttik = I, and bkj = 1; and this implies that Cij = 1. ■ 

Examination of the proof reveals that we also have the following two corollaries. 

Corollary 1 For I < i,j < m, Cij = 1 if and only if =^* . Hence, c-derivation and 

derivation are equivalent for the Cp^q nonterminals. 


Corollary 2 S w if and only if C is not the all-zeroes matrix. 

Let us now calculate the size of G. V consists of roughly 3{{df)‘^) ~ nonterminals. R 

contains about 6d IT-rules and (d^)^ ~ S'-rules. There are at most ^d-rules, since we have 
A-rules only for each non-zero entry in A; similarly, there are at most B-rules. And lastly, there 
are (<i2)S ~ C-rules. Therefore, our grammar is of size O(m^) with a very small constant factor; 
considering that G encodes mxm matrices A and B, it is not possible to shrink this much further. 

^This proof would have been simpler if we had allowed W to derive the empty string. However, we avoid epsilon- 
productions in order to facilitate the conversion to Chomsky normal form discussed in the next section. 
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w 

WeW\wi 

{l<i<3d + 6) 


We 

{l<e<3d + 6) 


^ fTjj Yj2 -(-5 

(one for each nonzero entry mj in A) 


- WWj,+s 

(2 < j2 < d + 1) 

ji ~ 

^ fTj2 + l+(5 Aj2-|-2<5 

(one for each nonzero entry bij in B) 

Aj2-|-2(i 

- WWq,+25 

(2 < j2 < d + 1) 

Cp,q — 

A R 

{I <P-,q,r < d^) 

s 

WT 


T 

- Cp,qW 

(1 <P,g < d^) 


Figure 5: A Chomsky normal form version of the productions of the grammar from the previous 
section. 


3.4 Chomsky normal form 


We would like our results to cover as large a class of parsers as possible. Some parsers, such as CKY, 
require the input grammar to be in Chomsky normal form (CNF), that is, where the right-hand 
side of every production consists of either exactly two nonterminals or exactly a single terminal. 
We therefore wish to construct a CNF version G' of G. However, not only do we want Theorem ^ 
to hold for G' as well as G, but, in order to preserve time bounds, we also desire that |G'| = 0(|G|). 

Unfortunately, the standard algorithm for converting CFGs to CNF can yield a quadratic blow¬ 
up in the number of productions in the grammar ( Hopcroft and Ullman, 19791 ) and thus is clearly 
unsatisfactory for our purposes. However, since G contains no epsilon-productions or unit pro¬ 
ductions, it is easy to convert G by adding a small number of record-keeping nonterminals and 
productions, with the resultant grammar G' having very similar parse trees — in particular, the set 
of substrings that are c-derived by the Gp^q nonterminals are the same in each grammar. Figure ^ 
gives the productions of G'. Note that G' has only 0{d) more productions and nonterminals, and 
so \G'\ = as well. 


3.5 Time bounds 

We are now in a position to show the relation between time bounds for Boolean matrix multiplica¬ 
tion and time bounds for CFG parsing. 

Theorem 2 Any c-parser P with running time 0{T{g)t{n)) on grammars of size g and strings of 
length n can be converted into a BMM algorithm Mp that runs in time 0(max(m^, T(m^)t(m^/^))). 
In particular, if P takes time 0{gn^~^), then Mp runs in time 

Proof. Mp acts as sketched in Figure p. More precisely, given two Boolean m x m matrices A and 
B, it constructs G (or G', as required) and w as described above. It feeds G and w to P, which 
outputs oracle Pg,w To compute the product matrix C, Mp requests from the oracle the value of 
Tg',«)(C'*i ji) * 2 ) j 2 + 2(5) (that is, whether or not derives or c-derives^ each i and 

j, 1 < i, j < m, setting Cij to one if and only if the answer is “yes”. 

®By corollary the two notions are equivalent in this case. 
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The running time of Mp is computed as follows. It takes 0{m?) time to read the two input 
matrices. Since G is of size 0{rn^) and |u;| = it takes 0{w?‘) time to build the input 

to P, which then computes J-g,w in time Retrieving C takes 0{rin?) since, by 

definition of c-parser, each query to the oracle takes constant time. So the total time spent by Mp 
is 0(max(m^, as claimed. 

Note that if we redefine c-parsing so that oracle queries take f{g, n) time instead of constant 
time, where g is the size of the grammar and n is the length of the string, then the bound changes 
to 0{ma.x{m^ f{g, n),T{ni^)t{rn}/^)))] as long as / is poly logarithmic, the second argument of the 
maximum in the bound surely dominates. 

In the case where T{g) = g and t{n) = n^~^, Mp has a running time of = 


The case in which P takes time linear in the grammar size is of the most interest, since, as 
mentioned above, in natural language processing applications the grammar tends to be far larger 
than the strings to be parsed. In this case, our result directly converts any improvement in the 
exponent for CFG parsing to a reduction in the exponent for BMM. For example, observe that 
Theorem 1^ translates the running time of the standard CFG parsers, 0{gn^), into the running time 
of the standard BMM algorithm, O(m^). Also, a c-parser with running time 0{gn‘^'^^) would yield 
a matrix multiplication algorithm rivalling that of Strassen’s ( 1969 ), and a c-parser with running 
time better than 0{gn^'^'^) could be converted into a BMM method faster than Coppersmith and 
Winograd (199C|) . As per the discussion above, even if such parsers exist, they would in all likelihood 
not be very practical. 


3.5.1 Parameter choices 


Since Valiant (1975) proved that an BMM algorithm can be transformed into a parser 

with time complexity in the string length, it is natural to ask whether our technique could 

yield the stronger result (if it is in fact true) that a CFG parser running in time 0{gn^~^) can be 
converted into an 0{m?~^) BMM algorithm. We now explain why such a result cannot be obtained 
by a straightforward modification of the reduction method we described above. 

Our run-time results are based on a particular choice of where to divide matrix indices into 
“high order bits” and “low order bits”; in particular, we set d, which parametrizes the number of 
low order bits, to d = . We determined this value by considering the effect of d on the size of 

the resulting grammar and string: roughly speaking, a larger value shrinks the former but expands 
the latter. For convenience, let us set d = m^, and consider how to pick £. 

Since combining the higher-order bits and the lower-order bits yields a matrix index of mag¬ 
nitude at most m, it follows that the string has size 0{rr/) and the grammar will have size 
0{m? + (the first term comes from the inclusion of the A- and B-rules, and the sec¬ 

ond term comes from the fact that the C-rules have to include the higher-order bits for three 
matrix indices). Hence, a parser with run-time complexity 0{grfi~^) yields a BMM algorithm with 
run-time complexity Inspection reveals that when I > 1/3, the first term 

dominates; when I < 1/3, the second term dominates; and the lowest upper bound occurs at the 
“crossing point” where i = 1/3. 


4 Related results 

We have shown that the existence of a fast practical CFG parsing algorithm would yield a fast 
practical BMM algorithm. Given that fast practical BMM algorithms are thought not to exist, this 
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establishes a limitation on the efficiency of practical CFG parsing, and helps explain why there has 
been very little success in developing practical sub-cubic general CFG parsers. 

There have been a number of related results regarding the time complexity of context-free 
grammar parsing and the relationship between this and other problems. We survey these results 
below. 

As mentioned above, the asymptotically fastest (although not practical) general context-free 
parsing algorithm is due to [Valiant (1975 ), who showed that the problem can be reduced to Boolean 
matrix multiplication (this is the “opposite direction” of the reduction we present). His algorithm 
shows that the worst-case dependence of the speed of CFG parsing on the input string length 
is 0{M{n)), where M{m) is the time it takes to multiply two m x m Boolean matrices together. 
(jRytter (1995 ) provides an alternate version of this algorithm with the same asymptotic complexity.) 

Methods for reducing Boolean matrix multiplication to context-free grammar parsing were 
previously considered by jlluzzo (1979 ). He proved that the problem of producing all possible parses 
of a string of length n with respect to a context-free grammar is at least as hard as multiplying 
two ^/n X ^/n Boolean matrices together. His technique encodes most of the information about the 
matrices in strings (as opposed to in the grammar, as in our method). Ruzzo’s result does not serve 
to explain why practical sub-cubic GFG parsing algorithms have been so difficult to produce, since 
using his reduction translates even a parser running in time proportional to to a cubic-time 
BMM algorithm. 

Harrison and Havel ( Harrison and Havel, 197^ ; [Harrison, 1978| ) note that there is a reduction 
of m X m BMM checking to context-free recognition (a BMM checker takes as input three Boolean 
matrices A, B, and C and reveals whether or not C is the Boolean product of A and B). These 
two decision problems are clearly related to the algorithmic problems we consider in this paper. 
However, this reduction, like Ruzzo’s, also converts a parser running in time proportional to 
to a cubic-time BMM checking algorithm, which, again, is not as strong a result as ours. 

The problem of on-line CFL recognition is to proceed through each prehx w\ of the input string 
w, determining whether or not w\ is generated by the input context-free grammar before reading 
the next {{i l)th) input symbol. The study of the complexity of this problem has a long history; 
in fact, the landmark paper of Hartmanis and Stearns (1965 ) that introduced the notions of time 
and space complexity contains an example of a CFL for which on-line recognition of strings of 
length n takes more than n steps. Currently, the best known lower bound for this problem is 
^(k^) (jSeiferas, 1986 ; Gallaire, 1969[ ). However, on-line recognition is a more difficult task than 
the standard CFL recognition problem (indeed, it is the extra constraints imposed by the on-line 
requirement that make it easier to prove lower bounds), and so these results do not translate to 
the usual recognition paradigm. To date, there are no non-trivial lower bounds known for general 
CFL recognition. 

Relationships between parsing other grammatical formalisms and multiplying Boolean matrices 
have also been explored. In particular, several researchers have looked at Tree Adjoining Grammar 
(TAG) (poshi. Levy, and Takahashi, 1975 ), an elegant formalism based on modifying tree structures. 
TAGs have strictly greater generative capacity than context-free grammars, but at the price of 
being (apparently) harder to parse: standard algorithms run in time proportional to n®, although 
[Rajasekaran and Yooseph (1995[ ) adapt Valiant’s ( 1975[ ) technique to get an asymptotically faster 
parser using BMM. 3atta (1994[ ) gives a reduction of Boolean matrix multiplication to tree-adjoining 
grammar parsing, demonstrating that any substantial improvement over 0{gn^) for TAG parsing 
would result in a sub-cubic BMM algorithm. Our reduction was inspired by Satta’s and resembles 
his in the way that matrix information is encoded in a grammar. However, Satta’s reduction 
explicitly relies on TAG properties that allow non-context-free languages to be generated, and so 
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cannot be directly applied to CFG parsing. 
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